! CLAIMS 

2 What is claimed is: 

3 

4 1. A method for identifying a repeat sequence, the method comprising the steps of: 

5 selecting a query sequence; 

6 testing said query sequence with a redundant file; 

7 identifying sequences in the redundant file that contain a similar sequence to a portion of 

8 the query sequence, wherein said identified sequences and said similar portion of the query 
39 sequence make up a pairwise sequence alignment; 

|to aligning all the identified pairwise sequence alignments; 

designating the right and left endpoints of each identified sequence and any intervening 
ijf2 sequences; 

identifying a position within the query sequence corresponding to each endpoint; 
defining regions within the query sequence, wherein a region is a sequence between two 
JI5 consecutive positions matching two endpoints; and 

16 identifying each regions having at least five sequence matches in the identified pairwise 

17 alignments as a repeat sequence. 



18 

19 2. A method for constructing a repeat database comprising: 

20 selecting a query sequence; 

21 selecting known repeat sequences; 

22 adding known repeat sequences into a repeat sequence database; 

23 masking said query sequence with repeat sequences in the repeat sequence database; 

24 testing said masked query sequence with a redundant file; 

27 



1 identifying sequences in the redundant file that contain a similar sequence to a portion of 

2 the query sequence, wherein said identified sequences and said similar portion of the query 

3 sequence make up a pairwise sequence alignment; 

4 aligning all the identified pairwise sequence alignments; 

5 designating the right and left endpoints of each identified sequence and any intervening 

6 sequences; 

7 identifying a position within the query sequence corresponding to each endpoint; 

8 defining regions within the query sequence, wherein a region is a sequence between two 
-IP consecutive positions matching two endpoints; 

W» identifying any two successive regions having a large variance in the number of sequence 

!3h matches; and 

"12 adding the sequence within the region of the two successive regions having the highest 

q13 number of sequence matches into the repeat sequence database. 

CH 

his 3. The method of claim 2, wherein the large variance in the number of sequence matches is 

16 equal to 5 or more. 

17 

18 4. A database product of the process of claim 2. 
19 

20 5. The method of claim 1 or 2, wherein said sequence is a deoxyribonucleotide sequence. 

21 

22 6. The method of claim 1 or 2, wherein said sequence is a ribonucleotide sequence. 

23 
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1 7. The method of claim 1 or 2, wherein said sequences are derived from animal DNA or 

2 RNA. 



3 

4 8. The method of claim 7, wherein said animal is a human. 

5 

6 9. The method of claim 8, wherein said animal is a mouse. 

7 

8 10. The method of claim 1 or 2 ? wherein said sequences are derived from plant DNA or 
& RNA. 

Ill) 

ill 11. The method of claim 1 0, wherein said plant is a single-cell plant. 

112 

S3 12. The method of claim 1 or 2, wherein said sequences are derived from fungal DNA or 

1114 RNA. 

Ci5 

16 13. The method of claim 1 or 2, wherein said sequences are derived from DNA or RNA of a 

1 7 microorganism or virus. 

18 

19 14. The method of claim 1 or 2, wherein said sequences are derived from DNA or RNA of a 

20 single-cell eukaryote. 
21 

22 15. The method of claim 1 or 2 ? wherein said sequences are derived from synthetic man- 

23 made DNA or RNA. 
24 
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1 16. The method of claim 1 or 2, wherein said sequences are postulated based upon amino 

2 acid sequences. 
3 

4 1 7. The method of claim 2, wherein said database is encoded in a biological medium. 

5 

6 18. The method of claim 2, wherein said database is encoded in a written medium. 
7 

8 19. The method of claim 2, wherein said database is encoded in an electronic medium. 

"Jp 20. The method of claim 19, wherein said electronic medium is a computer-readable 
ftl medium. 

S 2 

|; #3 21 . The method of claim 20, wherein said computer-readable medium is addressable through 
H4 an internet connection. 

II 5 

He 22. The method of claim 1 or 2, wherein said redundant file is a Public Domain Database. 
17 

18 23 . The method of claim 22, wherein said Public Domain Database is GenBank. 
19 

20 24. The method of claim 22, wherein said Public Domain Database is dbEST, 
21 

22 25. The method of claim 22, wherein said Public Domain Database is TIGR. 

23 

24 26. The method of claim 22, wherein said Public Domain Database is SwissProt. 

25 
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1 27. The method of claim 1 or 2, wherein sequence comparisons are carried out using a 

2 Database Search Algorithm. 

3 

4 28. The method of claim 27, wherein said Database Search Algorithm is BLAST. 

5 

6 29. The method of claim 27, wherein said Database Search Algorithm is FASTA. 

7 

8 30. The method of claim 27, wherein said Database Search Algorithm is Smith-Waterman. 

0 

# 31. The method of claim 1 or 2, wherein said sequence comparisons are carried out utilizing 
l|l a Scoring Matrix Program. 

112 

d3 32. The method of claim 31, wherein said Scoring Matrix Program is PAM. 

jT""? 

015 33. The method of claim 31, wherein said Scoring Matrix Program is BLOSUM. 

17 34. The process of Figure 2. 
18 

19 35, A repeat sequence product of the process of claim L 
20 

21 36. A kit for analyzing nucleotide sequences comprising: 

22 an electronic medium readable by a computer, said medium encoding a database 

23 produced by the method of claim 2. 
24 

25 37. A kit for analyzing nucleotide sequences comprising: 

31 



1 an electronic medium readable by a computer, said medium encoding a database 

2 produced by the method of claim 2; and, 

3 instructions for the use of said database. 
4 

5 38. A kit for analyzing nucleotide sequences comprising: 

6 an electronic medium readable by a computer, said medium encoding a database 

7 produced by the method of claim 2; 

8 instructions for the use of said database; and, 
^9 a computer. 

ft 

Ml 39. An improved database of nucleotide sequences, the improvement consisting of repeat 
sequences containing a similar sequence to a portion of a query sequence, wherein said identified 

;M3 sequences and said similar portion of the query sequence make up a pairwise sequence 

9§4 alignment, and wherein all identified pairwise sequence alignments have right and left endpoints 

Mf5 of each identified sequence and any intervening sequences. 
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